Aligning Sentences In Bilingual Texts French - English And French - Arabic

نویسندگان

  • Fathi Debili
  • Elyès Sammouda
چکیده

In this paper, we will tackle the problem raised by the automatic alignment of sentences belonging to bilingual text pairs. "[he method that we advocate here is inspired by what a person with a fair knowledge of the other langage would do intuitively. It is based on rite matching of the elements which are similar in both sentences. However, to match these elements correctly, we first have to match the sentences that contain them. There seems to be a vicious circle here. We will show how to break it. On the one hand, we will describe the hypotheses we made, and, nn the other hand, the algorithms which ensued. The experiments are carried out with French-English and French-Arabic text pairs • We will show that matching sentences and, later, expressions, amounts to raising a new problem in the machine translation field, i. e. the problem of recognition instead of that of translation, strictly speaking. REMERCIEMENTS : Le travail pr&~entg a b~n~fici4 de l'aide de nombreuses personne~. Nous les remercions routes, en particulier E. Souissi et A. Zribi pour leurs contributions ; E. Mackaay, L Naddeo-Souriau, J.-L. Lemoigne et la revue Pour la Science pour la gentillesse avec laquelle ils ont accept4 de nous donner sur disquettes des textes ou fragments de textes monolingues ou bilingues ; J. Kouloughli et J.-B. Berthelin pour les dis~nssions et critiques qu'ils sont mujours pr~ts it faire. Cette recherche a dt~ en partie finaacde par le R~seau des Industries de la IDmgue (contrat ACCT n ° 338/SG/C5) et en pattie par le MRT (ddcision d'aide n ° 90. K. 6434). RESUME Nous abordons clans ee papier le probl~me que pose la raise en eorrespondance automatique des phrases appartenant ~ des paires de textes bilingues. La m6thode que nous pr6conisons s'inspire de ce que ferait intuitivement une persorme eonnaissant moyennement I'autre langue. Elle se fonde sur l'appariement des 616meats qui constituent les phrases en regard. Or, pour apparier eorrectement ces 616ments, il faut au pr~alable avoir appari~ les phrases qui les contiennent. 11 y a l/t en apparence un cercle vicieux. Nous montrons comment le casser. Nons d6crivons les hypoth~es que nous raisons d'une part, et les algorithmes qui en d6coulent d'autre part. Les experimentations sont effeetu~os sur les couples de langues franfais-anglais et fran¢ais-arabe. Nons montrons que l'appariement des phrases, et, darts 1'6tape d'apr~, des expressions, revient h po~r un probl~me nouveau en traduction automatique …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Program for Aligning Sentences in Bilingual Corpora

Researchers in both machine Iranslation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English). This paper describes a method for aligning sentences in these parallel texts,...

متن کامل

K-vec: A New Approach for Aligning Parallel Texts

Various methods have been proposed for aligning texts in two or more languages such as the Canadian Parliamentary Debates (Hansards). Some of these methods generate a bilingual lexicon as a by-product. We present an alternative alignment strategy which we call K-vec, that starts by estimating the lexicon. For example, it discovers that the English word fisheries is similar to the French pêches ...

متن کامل

- 1 - A Program for Aligning Sentences in Bilingual Corpora

Researchers in both machine translation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (such as French and English). One useful step is to align the sentences, that is, to id...

متن کامل

Aligning Sentences in Parallel Corpora

In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our da.ta, the only information about the sentences that we use for calculating alignments is the number of tokens that they contain. Because we make no use of the lexical details of the sentence, the alignment compu...

متن کامل

LIUM SMT Machine Translation System for WMT 2010

This paper describes the development of French–English and English–French machine translation systems for the 2010 WMT shared task evaluation. These systems were standard phrase-based statistical systems based on the Moses decoder, trained on the provided data only. Most of our efforts were devoted to the choice and extraction of bilingual data used for training. We filtered out some bilingual ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992